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Abstract 

A path-integral formalism is proposed for studying the dynamical evolution in time of pat- 
terns in an artificial neural network in the presence of noise. An effective cost function is 
constructed which determines the unique global minimum of the neural network system. The 
perturbative method discussed also provides a way for determining the storage capacity of the 
network. 
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1. Introduction 

It has been of interest since long to understand the mechanism of learning and memory in biological 
systems and machines. Studies of associative memory have sought to model the process of pattern 
recognition and recall using specific cost functions. 

The similarity of the McCuUoch-Pitts neural network to the Ising spin system has enabled statistical 
physics approaches [1-3,4] to be used to get information like the storage capacity of the network. 
In the "space of interactions" approach of Gardner et al [2,3], an imbedding condition is postulated 
and the energy function counts the number of weakly-imbedded pattern spins which have stability 
less than a specified value. Though this approach does a systematic study of a neural network 
configuration at any given instance of time, it does not address the question of time evolution of 
the network configuration. Some years ago. Hertz et.al.[5,6] studied the dynamics of learning in 
a single-layer neural network using a Langevin equation for the evolution in time of the synaptic 
efficacies. In these papers the authors have investigated the role of noise in learning and studied the 
possible phase transitions in the learning process. 

In our work we have taken such a viewpoint, of looking at the learning process as a non- 
equilibrium stochastic process, as our starting point for constructing a path-integral framework 
for studying neural network dynamics. 

The problem of neural networks getting trapped into spurious states or local minima is well 
known and a method for directly getting to the global minimum of the network is highly desirable. 
Of much more interest is a systematic theory which gives a framework for determining the global 
minimum of the neural network model, independent of the choice of the cost function. 
An attempt has been made in this work to achieve this through the path-integral framework using 
concepts from quantum and statistical field theory — we have considered a perceptron only for the 
sake of simplicity. 

In this framework it is seen that the patterns in the network settle from non-equilibrium states 
into certain attractor states which correspond to those with lowest energy at equilibrium, or for 
large values of the time. We construct an "effective cost function" for any cost function one starts 
with, and discuss how the global minimum for a neural network can be determined and how one can 
calculate the storage capacity of the network with this construction. 
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2. Langevin dynamics in a path- integral approach 

Langcvin dynamics has been applied to analyse disordered systems, in particular, to spin glasses 
and to retrieval processes in attractor neural network models with fixed weights, by a number of 
authors [7] . In our work, we look at learning dynamics from a slightly different viewpoint — through 
a path-integral framework, and with a dynamical evolution in time for the synaptic efficacies. 
As in [5,6], we view the problem of learning in nemal networks as a stochastic process and for 
simplicity, we look at the perceptron only, with one layer of connections. We postulate a stochastic 
Langevin dynamics for the evolution in time of the synaptic efficacies Wij . 

where the input index i takes values from f to N and we have omitted the output index which can 
be treated separately. rji{t) stands for a random white noise source, T is the noise level, E is the cost 
function and the parameter 7 describes the learning rate. The system can alternatively be thought 
to be coupled to a heat reservoir at temperature T (which represents the noise level) and evolves in 
time t until it reaches an equilibrium configuration at t ^ 00 . 

Here, and in the following, the space label has been suppressed and the index i should be inter- 
preted to include the space variable also. We assume that the noise sources are gaussian with the 
correlations : 



= 

{rii{T)m{T')) = 2^T5ij5{r-T') (2) 

The values of the synaptic efhcacies at any instant of time are determined by solving (1) and the 
correlations between them can be calculated using (2). 

We wish to construct a partition function and a path integral framework for this simplest type 
of neural network which incorporates the time evolution of the synaptic strengths. Pattern recall 
and recognition takes place best when the Hamming distance between the target for pattern 
fi = f , . . . ,p and the input pattern is minimal for ^ = v. This means that the synaptic strengths 
ujij change in such a way as to descend the cost function surface. 

In an artificial neural network, the problem of the system getting trapped in its various local 
minima or spurious states is a familiar one. Here, prior knowledge of the global minimum for the 
system and the synaptic efficacies corresponding to it would be very useful as it would save a great 
deal of effort and computer time and enable greater efficiency in solving pattern recognition and 
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associative memory problems. 



We adapt the procedure which was elucidated by Gozzi [8] in a different context, to the neural 
network system and write down a partition function for this system evolving in time through the 
Langevin dynamics of (1) : 

Z[J] =N\\ /"i?a;i£)r?ie-^/-^^(-')-i(-')<i-'p(^(o))^(^. _^.^)e-/^<i-' (3) 

where M is & normalization constant, and we have introduced an external source Ji{t) which 
probes the fluctuations of the ergodic ensemble. At the end of the calculations, in the thermody- 
namic limit N 00, Ji would be set to zero, w,^ is the solution of the Langevin equation (1) with 

the initial probability distribution P(lu{0)). 

Using some algebraic manipulations we now rewrite this partition function in such a manner that 
the dynamical evolution of the network configuration in time becomes more apparent and it is seen 
that for large values of the time, the patterns evolve into the configuration corresponding to the 
ground state energy of the system. 



From (1), we can write : 







) 


Su>i 



where the Jacobian 



of the transformation r]i — > a;, can be written as 



(4) 



det 



exp 



5ijdr + -yT ^ . 5{t - t') 

5uJi{T)5uJj{T') J 



tvlndr ( 6,j5{t - t') + dr'^jT 



5^E 



5LOi{T)5LOj{T') 



(5) 



and dr' ^ satisfies : 



drG{T - t') = 6{t - t') 



(6) 



Equation (4) then reduces to : 



SuJi 



exp < tr 



Indr + In Sir - r') + djir - t')jT- 



S^E 



(7) 



Sujj{T)duJi{T') ^ 

Since we are primarily interested in the dynamical evolution of the patterns of the neural network 
forward in time, the Green's function G(t — r') must satisfy 
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G{t - r') = eir - t') 

Substituting for G from (7) into (6), and expanding out the logarithm we obtain 



(8) 



5uJi 



(9) 



where an overall factor of trlnSr in the exponential has been absorbed in the normalization. We 
choose to work with the mid-point prescription ^(0) = 1/2 which leads to 



6uJi 



= g 2 Jo sZ^TcTF 



(10) 



Prom (1), (3) and (10) the partition function can be rewritten as 



The A''-point correlations between the synaptic efficacies can be calculated by taking the N-th func- 
tional derivative of Z[J] with respect to J. After making a change of scale of the time variable r as 
: r' — > t" = 2t' we find that the partition function can be rewritten as 

Z[J]=J^l[ J Da;,P(..(0))e-^r^^"''^'-r''^"w^^(^")-(^")g-^g^ (12) 
where we have defined a Fokker-Planck lagrangian as : 



(13) 



2^ dr' ' 8 ^5uJi' 4 Jw? 
It may be observed that this can be derived from a Fokker-Planck Hamiltonian H^^ which obeys 
the heat equation 
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(14) 



where 



(15) 
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and 



It is possible to find a series solution to (14) : 



^^""=7l^ho— +0(7-) (16) 



«'(a;,r)=^c„^„e-2^"- (17) 

n 

where c„ are normalizing constants and .£„ > are the energy eigenvalues of the operator equation: 

jy^^Vn = i^nVn (18) 



It is easy to see that H.^^ is a positive semi-definite operator with its ground state = defined 
by V'o = e 2 .In the ( 
to P(w, r) and we have 



by V'o = e 2 .In the equilibrium limit f — > oo, only the ground state configuration contributes 



lim P(a;,T) =coe-®('") (19) 

t — ^oo 

Interestingly, following Gozzi [9], it is also possible to define a canonical momentum 11 conjugate to 
the variable representing the synaptic efficacy, from the Fokker-Planck lagrangian : 

*' = ^ = 2^<--<' 
SO that the partition function can be written in the form of the Gibbs average of equilibrium statis- 
tical mechanics : 

Z\J\=MW I 25a;i(0)I?r[i(0)e-/^*"'+^('^)+^^(^')'^'(^')>''^' (21) 
The advantage of writing Z in this form is that the integration measure and Z are independent of 

T. 

The correlations between the synaptic strengths can be calculated either using eqns.(2) within the 
Langevin approach, or using the equivalent Fokker-Planck equations. Since the noise-sources are 
delta-correlated, the correlations between the synaptic efficacies would be stationary for a proper 
choice of the initial probability P{lj{0)) : 

H('ri) • • •'^r,(TO)^,p(^(o)) = - T2,r2 - T3, . . . ,Tl - Tl-i) (22) 

whore (•),;. p(cj(o)) denotes average over both rj and ci;(0). Since the correlations depend only on the 
time differences, these would remain invariant under a uniform translation in the time : 
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• • • '^r,(T;))^,p(^(o)) = K(ti +t)... U}r,{Tl + t))^,p(^(o)) (23) 

Setting all TjS equal and taking the i — > oo on both sides we find that while the left hand side of 
equation (23) is independent of t, the right hand side is the average with the equilibrium distribution 
(19). From here it is clear that it is not necessary to take the t ^ oo limit to get the equilibrium 
steady state distribution — at every finite t, the stochastic correlations are already the steady-state 
equilibrium ones. As the steady state equilibrium distribution for the neural network system corre- 
sponds to its lowest energy state, it is clear that as i ^ oo , the patterns evolve into that having the 
lowest energy which acts as the attractor state for the dynamical system. 

It is necessary to determine the global minimum of the cost function. We do this by construct- 
ing an effective cost function Tefflu)] by performing a Legendre transformation which removes the 
dependence on Jj in favour of a dependence on the average u)i of the synaptic strengths : 

^effp]=-jT\nZ[J]- [ Ji{T')Q,{T')dT' (24) 

Jo 

where 



, , ^5)11 Z[J] 

oJi 

By construction, the exact effective cost function (24) is convex [10] and its global minimum gives 
the ground state energy of the neural network system. 
Equations (24) and (25) can be combined to give 

Here, just as in [7], the time interval between and r has been sliced into N —1 infinitesimal parts: 



Af-l 

m 

JV- 



Vcui = lim TT ViOr,, (27) 

i=l 

being the configuration of the synaptic efficacies at time r. 
As it is not possible in general to solve this exactly, we find a solution by assuming that we can 
make an expansion of Fe/ / [lo] in powers of a small parameter which we take as 7T : 
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re//[a;] = X:(7Trr(")[a;] 



(28) 



n=0 



Wc discuss later how one can explicitly dcterniine the value of the expansion parameter. 
Substituting (28) into (26) and making a Taylor expansion of about cu we obtain 

(29) 

where we have performed the shift : 



(7T)^/^(Di =u)i -iOi 
By substituting the expansion (28) on the left hand side of (29), we obtain 



(30) 



1 x2rFP 



Thus the global minimum for the neural network system can be found by minimising 



£^^(w) + ^lndet 



7T. 



2 rFP 



2 5lj? 



+ • 



(31) 



(32) 



with respect to Wj. 

In the equilibrium limit the Wi term in (13) would not contribute to C^^, so using (13) in (32) and 
finding the root cJimin of the equation 







(33) 



which minimizes Fe//, it is possible to avoid the problem of spurious minima for any specific choice 
of the cost function for a particular learning rule. 

In this framework, the roots of (33) giving rise to the various local minima for the particular neural 
network under consideration are the source of the spurious minima for the system. Since Tgf f [ui] is 
convex, it is clear that by finding the root which gives the minimal value of the effective cost function, 
one can immediately arrive at the global minimum of the system without getting any interference 
from the spurious states. 
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As an example, consider the cost function for a perceptron discussed in [6] 



1^ ^ j 3 

where the constant A was added to keep the connections from going to infinity. The external source 
Ji in our framework plays the role of the auxiliary field hi in [6]. 
The Fokker-Planck lagrangian in this case is : 

^FP 1 •2 , {iT)- 



^ ( (a + ^ E E erer) - E ;^c^^r) ^ - f E + ^) (3^) 

Substituting from (35) into (12) and using the first relation in (25) we obtain after setting 

J = in the equilibrium limit, 



-1 

This is in agreement with the result obtained in [6]. 



Prom the fluctuation - response theorem, the response function at equilibrium is just the full con- 
nected propagator dk which is given by : 

where Gq is the trcc-lcvcl propagator and one can calculate the self-energy S using diagrammatic 
methods [5,6]. It was shown in [5,6] that the self energy is given by : 

" " (38) 



1 + G 

where a = Pmax/N is the storage capacity of the network. In these papers the authors calculated 

the storage capacity of the network at equilibrium using diagrammatic methods. 

Since we have G^-^ = {'yT)~^ Si^sL^ ' determine the storage capacity analytically from : 

Since we have assumed that 7T is a small parameter, it is sufficient to work upto first order in the 
7T expansion of Fe//. The steady-state equilibrium limit r — > 00 corresponds to the most ordered 
state when the patterns have settled into their attractor states. In the phase space of synaptic 
strengths, it is thus a state of minimum symmetry or zero entropy. This means that in the equilib- 
rium limit, 
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(-7TlnZ[J]) = ^ (vefAoj] + Ji(T')a)i(r')dr') = 



(40) 



_5_ 

If 

In the thermodynamic limit N ^ oo, J is set to zero, and we get the result that for the attractor 
states: 



which shows that in this limit the effective cost function is stable with respect to changes in the 
noise level. 

Condition (41) can be used to determine the value of the quantity ^/T at which the system settles 
into an attractor state. For the cost function (34) for example, one obtains : 



'yT: 



V3 



lndet(A + iE^.E.ere^ 



1/2 



(42) 



AT L^fii' 

Applying equations (32) and (39) to the example of (34), and using eqn.(35), we obtain the following 
result for a : 



\ i \xv J 



(43) 



The value of 7T obtained in (42) can then be substituted into (43) so that in the thermodynamic 
limit one obtains the result 



^/3Vlndet(A+^E^.EiCre)y ^ ^ 

(indet(A+'iE,lE.^rer)) ^^^^5?^^'^'^^ ^^"^ 

for the storage capacity of the network described by the cost function (34). 

Although in the pseudo-inverse learning rule considered in the example above, the relation between 
the input and the output is linear, our construction of the effective cost function and our method 
for calculating the storage capacity of the network is applicable also for models with non-linear 
input-output relation such as : 

C'' = /(E^i^j'/^^) (45) 

3 

where j{x) is a non- linear function of x. The procedure itself at no stage depends on any particular 
model and is independent of whether the input-output relation is linear or non-linear. 
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In this manner it is possible to calculate the storage capacity for any neural network system, in a 
completely analytical manner, without having to resort to diagrammatic methods, and independent 
of the choice of the cost function. It may be observed that the procedure we have set up in the fore- 
going allows the explicit calculation of the synaptic efficacies ojij from the input and the output Cf- 

Discussion 

We have shown that a neural network system can be viewed as a non-equilibrium stochastic system 
of synaptic efficacies which evolve for very large values of the time into an equilibrium configuration 
having the lowest energy which acts as the attractor state of the network. 

An effective cost function is constructed and a perturbative scheme is developed for calculating it. 
The global minimum of the effective cost function can be determined and gives the exact ground 
state energy of the system. It is shown that in the thermodynamic limit, the effective cost function 
is invariant under changes in the noise level. In the perturbation expansion we have constructed for 
Fe//, we have assumed that the expansion parameter -yT is small. This can be ensured by always 
keeping the noise-level low so that 7T <^ 1. 

In this paper we have constructed a path integral framework for the simplest case of a pcrceptron. It 
should be possible to generalize this construction for the more realistic case of many-layered neural 
networks and to arrive at their global minima straightaway without having to bother about the 
various local minima. 
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